Comparing the Effectiveness of Treatment for Post-Traumatic Stress Disorder (PTSD) in Veterans and Non-Veterans

Promotion Research of Kirsten Reij

Authors
Affiliations
Kirsten Reij

Psytrec

Ernst Paul Swens

Swens Data Science

1 Data Preperation

We examined the data prior to conducting any statistical analysis. We started by reformatting and re-coding variables, followed by a matching procedure. Lastly, we present the descriptive statistics of the matched dataset.


1.1 Data Inspection

The following R packages were utilized.

library(lubridate)   # package to help calculate age of patients
library(tableone)    # package to help create descriptive table
library(magrittr)    # package to add additional pipe options
library(labelled)    # package to help with labeling columns
library(MatchIt)     # package for the matching procedure
library(ggplot2)     # package for creating fancy plots
library(ggpubr)      # package to combine fancy plots
library(table1)      # package to create table one
library(tidyr)       # package to make life easier
library(dplyr)       # package to make life easier

We performed feature reduction on the raw data, retaining only the variables veteran, starting_date, age, sex, pre_score, post_score, pre_diagnosis, post_diagnosis and duration. We re-coded the variables veteran, starting_date, and age. We created a missingness indicator follow_up_miss for the follow-up treatment score. There was one veteran removed without the pre-treatment diagnosis of PTSD.

# read in the raw data
load("data/data_raw.Rdata")

# select relevant columns, clean dates, list-wise omit missing data
vet <- data %>% 
  dplyr::select(
    IsVeteraan, geslacht, StartDatum, geboortedatum, BEHDAGEN_GEPLAND, 
    CAPS5Score_IN, CAPS5Score_TK, CAPS5Score_FU, diagnose_IN, diagnose_TK) %>% 
  dplyr::mutate(
    id              = row_number(),
    veteran         = as.factor(IsVeteraan),
    sex             = geslacht,
    starting_date   = as.Date(StartDatum, "%Y-%m-%d"),
    age             = trunc((as.Date(geboortedatum, "%Y-%m-%d") %--% 
                        Sys.Date()) / years(1)),
    pre_score       = CAPS5Score_IN,
    post_score      = CAPS5Score_TK,
    follow_up_miss  = !is.na(CAPS5Score_FU),
    pre_diagnosis   = factor(
                        diagnose_IN, levels = c(0, 1), 
                        labels = c("No PTSD", "PTSD")),
    post_diagnosis  = factor(
                        diagnose_TK, levels = c(0, 1), 
                        labels = c("No PTSD", "PTSD")),
    duration        = BEHDAGEN_GEPLAND,
    .keep = "none") %>%
  dplyr::filter(id != 47) %>%
  dplyr::filter(pre_diagnosis == "PTSD") %>%
  na.omit()

The description of the resulting dataset variables is as follows.

Variable Explanation Coding
veteran being a veteran factor
starting_date date start treatment date
sex sex of patient factor
age age of patient continuous
pre_score pre-treatment CAPS score continuous
post_score post-treatment CAPS score continuous
follow_up_miss missingness indicator for follow-up score factor
pre_diagnosis diagnosis of PTSD pre-treatment factor
post_diagnosis diagnosis of PTSD post-treatment factor
duration treatment duration factor

1.2 Matching

Before making statistical inferences, we took into account the class imbalance of veteran and the potential confounding effects of age, sex, and duration.

Warning in ModuleReturnVarsExist(vars, data): The data frame does not have:
stay Dropped
                 Stratified by veteran
                  0             1            p      test
  n                3848            43                   
  sex (mean (SD))  1.63 (0.48)   1.07 (0.26) <0.001     
  age (mean (SD)) 41.77 (12.49) 46.93 (8.29)  0.007     

We noticed a significant class imbalance with only 43 veterans and 3848 non-veterans. We should be careful with the confounding effect of age and sex, as both standardized mean differences are greater than 0.5. Therefore, we used a nearest neighbor matching procedure. The algorithm matched each veteran with the most similar non-veteran based on sex, age, starting_date, and duration.

# set seed to make stochastic results reproducible
set.seed(123)

# matching non-veterans 
match <- matchit(
  veteran ~ starting_date + sex + age + duration, 
  method  = "nearest",
  m.order = "largest",
  replace = FALSE,
  data    = vet)

# matched data set
vet <- match.data(match)

After the matching procedure, we observed a significant decrease in the standardized mean difference. This suggests that the matching procedure was successful in controlling the confounding effects of age, sex, and duration.

                      Stratified by veteran
                       0             1            p      test
  n                       43            43                   
  sex (mean (SD))       1.07 (0.26)   1.07 (0.26)  1.000     
  age (mean (SD))      46.95 (11.18) 46.93 (8.29)  0.991     
  duration (mean (SD))  7.00 (1.89)   7.14 (1.73)  0.722     

After the matching procedure, we added auxiliary data to describe the sample in more detail.


2 Patient Characteristics

The following table presents the patient characteristics.

Total
(N=86)
Non-Veteran
(N=43)
Veteran
(N=43)
p-value
Sex
Male 80 (93.0%) 40 (93.0%) 40 (93.0%) 1
Female 6 (7.0%) 3 (7.0%) 3 (7.0%)
Pre-treatment CAPS score
Mean (SD) 43.1 (7.69) 42.3 (8.04) 43.9 (7.33) 0.343
Median [Min, Max] 43.0 [25.0, 61.0] 41.0 [29.0, 61.0] 45.0 [25.0, 61.0]
Post-treatment CAPS score
Mean (SD) 15.2 (15.7) 16.8 (17.5) 13.5 (13.7) 0.34
Median [Min, Max] 10.5 [0, 68.0] 12.0 [0, 68.0] 8.00 [0, 48.0]
Difference post and pre-score
Mean (SD) -28.0 (15.7) -25.6 (17.0) -30.4 (14.0) 0.153
Median [Min, Max] -30.0 [-61.0, 18.0] -30.0 [-61.0, 18.0] -31.0 [-51.0, 2.00]
Age
Mean (SD) 46.9 (9.78) 47.0 (11.2) 46.9 (8.29) 0.991
Median [Min, Max] 46.0 [24.0, 68.0] 46.0 [24.0, 68.0] 46.0 [33.0, 67.0]
War violence
No 40 (46.5%) 39 (90.7%) 1 (2.3%) <0.001
Yes 46 (53.5%) 4 (9.3%) 42 (97.7%)
Sexual violence
No 41 (47.7%) 13 (30.2%) 28 (65.1%) 0.004
Yes 45 (52.3%) 30 (69.8%) 15 (34.9%)
Physical violence
No 13 (15.1%) 5 (11.6%) 8 (18.6%) 0.549
Yes 73 (84.9%) 38 (88.4%) 35 (81.4%)
Natural disasters and serious accidents
No 15 (17.4%) 12 (27.9%) 3 (7.0%) 0.019
Yes 71 (82.6%) 31 (72.1%) 40 (93.0%)
Mood disorders
No 34 (39.5%) 16 (37.2%) 18 (41.9%) 0.823
Yes 52 (60.5%) 27 (62.8%) 25 (58.1%)
Anxiety disorders
No 46 (53.5%) 21 (48.8%) 25 (58.1%) 0.5
Yes 40 (46.5%) 22 (51.2%) 18 (41.9%)
Suicide risk
None 25 (29.1%) 10 (23.3%) 15 (34.9%) 0.493
Low 36 (41.9%) 19 (44.2%) 17 (39.5%)
Medium 16 (18.6%) 8 (18.6%) 8 (18.6%)
High 7 (8.1%) 5 (11.6%) 2 (4.7%)
Missing 2 (2.3%) 1 (2.3%) 1 (2.3%)
Treatment duration
8 days or more 67 (77.9%) 33 (76.7%) 34 (79.1%) 1
Less than 8 days 19 (22.1%) 10 (23.3%) 9 (20.9%)


Including p-values in Table 1 may be easy, but it is not recommended. This is true whether the data is from an observational study or a randomized trial. If p-values are included to show crude associations between predictors and outcomes, they may not be relevant as adjusted assessments from regression analysis are usually more important. Including p-values in Table 1 to show differences in participant characteristics between exposure groups is not useful. What matters for confounding is the magnitude of differences in the sample, not if the groups differ in the population (as tested by p-values).


3 Missing Data

There are 46 veterans records in total, with one being a re-registration and another without PTSD diagnosis. Excluding these records, there are 45 veterans. Of those 45, 43 have a CAPS score after treatment and 23 have a follow-up CAPS score. The next section examines whether there are any differences between veterans who have a follow-up CAPS score and those who do not.


3.1 Missing Data Mechanisms

First, we will explain some technical terms. Then, we will give a more understandable explanation.

The variable \(R\) represents whether or not a follow-up CAPS score is missing, where \(R = 0\) represents a missing score and \(R = 1\) represents a known score. The distribution of \(R\) may depend on the data, represented by \(Y\). This data may contain complete and missing information, represented by \(Y = (Y_{obs}, Y_{mis})\). The missing data may be due to either design or circumstance, such as a forgotten confounding variable. The parameter of the missing data model is represented by \(\psi\), and the general expression of the missing data model is \(P(R=0|Y_{obs}, Y_{mis},\psi)\), which expresses the probability that the follow-up score is missing, given the observed and missing data, and the parameter of the missing data model.

We can now formulate the hypothesis that:

\[H_0: P(R=0|Y_{obs}, Y_{mis},\psi)=P(R=0|\psi).\]

This implies that the observations with missing follow-up scores are a random subset of all observations, determined by \(\psi\). Therefore, there are no systematic differences between the missing and observed follow-up scores. This is known as missing completely at random (MCAR).

The alternative hypothesis is formulated as:

\[H_1: P(R=0|Y_{obs}, Y_{mis},\psi) \ne P(R=0|\psi)\]

This means that the observations with missing follow-up scores are not a random subset of all observations. The missingness may depend on another observed variable, such as veterans with higher pre-treatment scores being less likely to be lost to follow-up. This is known as a missing at random (MAR) mechanism. If the missingness depends on something that was not measured, it is called missing not at random (MNAR).

It is not possible to definitively distinguish between MCAR, MAR, and MNAR based on the data alone. However, by analyzing the standardized mean differences between the two groups, we can infer which mechanism is most likely to be the cause of missing data. If the standardized mean differences are small, it suggests that the missing data is likely MCAR. If the standardized mean differences are larger, it suggests that the missing data may be MAR or MNAR.


3.2 Exploration

There are minimal variations in age, duration, pre_score, and post_score between the veterans with and without follow up scores. This is indicated by the SMD being less than 0.5.

                                 Stratified by follow_up_miss
                                  FALSE         TRUE          p      test
  n                                  20            23                    
  age (mean (SD))                 44.95 (7.86)  48.65 (8.45)   0.146     
  duration = Less than 8 days (%)     4 (20.0)      5 (21.7)   1.000     
  pre_score (mean (SD))           42.55 (8.68)  45.13 (5.84)   0.254     
  post_score (mean (SD))          12.05 (13.86) 14.83 (13.78)  0.515     

This is also supported by the figure which shows that the distributions of the groups overlap nicely.

In conclusion, the reason for the missing follow-up score appears to be random, supporting the assumption that \(H_0\) is true.


4 Marginal Analysis

In this chapter, we investigate the effectiveness of the treatment for both veterans and non-veterans by examining the pre- and post-treatment scores. The figure presents the CAPS scores for both groups, which show a decrease in the CAPS scores. Note, the error bars in the figure below shows the 95% confidence interval of the paired t-test. The following sections will determine if this decrease is statistically significant and clinically meaningful.


4.1 Veterans

For the veteran group, the following hypotheses can be formulated:

  • Null Hypothesis (\(H_0\)): The treatment has no effect on the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being greater than or equal to zero (\(CAPS_{post} - CAPS_{pre} \ge 0\)).

  • Alternative Hypothesis (\(H_1\)): The treatment does decrease the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being less than zero (\(CAPS_{post} - CAPS_{pre} < 0\)).

These hypotheses can be tested using a paired t-test. The effect size can be quantified by Cohen’s distance.


    Paired t-test

data:  post_score and pre_score
t = -14.238, df = 42, p-value < 2.2e-16
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
      -Inf -26.80479
sample estimates:
mean difference 
      -30.39535 

The results from the pre-test (M = 43.9, SD = 7.3) and post-test (M = 13.5, SD = 13.7) CAPS interview indicate that the treatment significantly reduced symptoms of PTSD in veterans with war-related trauma, t(42) = -14.2, p < 0.001.


Cohen's d

d estimate: 2.763555 (large)
95 percent confidence interval:
  lower   upper 
2.16395 3.36316 

The effect size for this analysis (d = 2.76) was found to exceed Cohen’s (1988) convention for a large effect (d = .80).


4.2 Non-Veterans

For the non-veteran group, the following hypotheses can be formulated:

  • Null Hypothesis (\(H_0\)): The treatment has no effect on the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being greater than or equal to zero (\(CAPS_{post} - CAPS_{pre} \ge 0\)).

  • Alternative Hypothesis (\(H_1\)): The treatment does decrease the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being less than zero (\(CAPS_{post} - CAPS_{pre} < 0\)).

These hypotheses can be tested using a paired t-test. The effect size can be quantified by Cohen’s distance.


    Paired t-test

data:  post_score and pre_score
t = -9.8795, df = 42, p-value = 8.046e-13
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
      -Inf -21.20696
sample estimates:
mean difference 
      -25.55814 

The results from the pre-test (M = 42.3, SD = 8) and post-test (M = 16.8, SD = 17.5) CAPS interview indicate that the treatment significantly reduced symptoms of PTSD in veterans with war-related trauma, t(42) = -9.9, p < 0.001.


Cohen's d

d estimate: 1.878293 (large)
95 percent confidence interval:
   lower    upper 
1.363465 2.393121 

The effect size for this analysis (d = 1.88) was found to exceed Cohen’s (1988) convention for a large effect (d = .80).


5 Reliable Change Index

The Reliable Change Index (RCI) is a statistical tool used to determine whether a change in a test score is meaningful or simply due to random measurement error. The RCI takes into account the reliability of the measure, the variability in the obtained scores in the group, and the change in the individual’s score and yields a pseudo–z-statistic. It is important to note that the RCI is only a rough estimate of the meaningfulness of a change and does not guarantee that a change is real or significant.

We computed the RCI using a test-retest reliability of 0.73 and the pooled standard deviations of the test scores. A patient was considered to have improved if the CAPS score decreased with 13.25 points. From the figure it is interesting to observe that for the non-veterans improvement and no PTSD diagnosis post-treatment appear to coincide, whereas this is not the case for veterans. There are veterans that did improve on their scores, but remained with the PTSD diagnosis post-treatment.

In the veteran group compared to the non-veteran group, more participants improved and less participants remained unchanged according to the RCI. In the non-veteran group, two people deteriorated.

             
              Deteriorated Unchanged Improved
  Non-Veteran            1        11       31
  Veteran                0         5       38

However, if we look at the diagnosis of PTSD post-treatment, we observe equal numbers.

             
              No PTSD PTSD
  Non-Veteran      34    9
  Veteran          32   11

6 Group Differences

In this chapter we will explore the differences between the veterans and non-veterans.

6.1 Statistical Analysis

In this study we investigate the variations in CAPS scores pre- and post-treatment among veterans and non-veterans, using a Bayesian repeated measures ANOVA. This method involves creating a model comparison that incorporates various fixed effects. The null model includes a fixed effect for time and random intercepts and slopes for time per person. The first model includes all the components of the null model, along with a fixed effect for veteran status. The second model includes all the components of the first model and also incorporates an interaction term between veteran status and time.

In our analysis, we use the Bayes factor to compare two distinct models by computing the ratio of their marginal likelihoods. In particular, we utilize the \(\text{BF}_{01}\), which gauges the relative strength of evidence in favor of the null model compared to an alternative model. To determine the degree of evidence strength, we follow the widely used guidelines proposed by Lee and Wagenmakers (2013).

In our analysis, we adopted an uninformative prior for the model prior, which entails assigning equal prior probability to each of the competing models before observing any data. This means that we did not express any prior preference or bias towards any particular model. By using an uninformative prior, we allowed the data to have a greater impact on the posterior distribution over models.

In our analysis, we employed the standard multivariate Cauchy distribution in JASP (version 0.17.1) as the prior for the effects. To ensure the robustness of our results, we also calculated various Bayes factors for a range of scale parameters, from 0.0 to 1.5, for the fixed effects.

6.2 Results

Model Comparisons
Models P(M) P(M|data) BF\(_\text{M}\) BF\(_\text{01}\) error %
Null model 0.333 0.739 5.664 1
Veteran 0.333 0.18 0.44 4.097 2.456
Veteran + Veteran × Time 0.333 0.081 0.175 9.173 2.698

According to the first table, the model probability of the null model increased after observing the data, while the probabilities of the other models decreased. Specifically, the data were four times more likely to have originated from the null model than from model one, and nine times more likely to have come from the null model than from model two. These findings suggest moderate evidence in favor of the null hypothesis, indicating that there is no fixed effect of veteran status and no interaction between veteran status and time.

Analysis of Effects
Effects P(incl) P(excl) P(incl|data) P(excl|data) BF\(_\text{excl}\)
veteran 0.667 0.333 0.261 0.739 5.664
Time ✻ veteran 0.333 0.667 0.081 0.919 5.706

In the second table, we observe that when we average across the models, the evidence for excluding the fixed effects is 5.5 times greater than the evidence for including them. This suggests that the data support the null hypothesis more strongly than the alternative hypotheses that include the fixed effects.

Model Averaged Posterior Summary
Variable Level Mean SD Lower CI Upper CI
Intercept 29.49 1.066 27.353 31.592
Veteran Non-Veteran 0.661 1.017 -1.401 2.686
Veteran -0.661 1.017 -2.730 1.357
Time Pre-score 14.074 0.828 12.410 15.688
Post-score -14.074 0.828 -15.832 -12.516
Veteran × Time Non-Veteran & Pre-score -0.902 0.780 -2.488 0.656
Non-Veteran & Post-score 0.902 0.780 -0.669 2.475
Veteran & Pre-score 0.902 0.780 -0.669 2.475
Veteran & Post-score -0.902 0.780 -2.488 0.656

The final table displays the Model Averaged Posterior Summary, which presents the 95% credible intervals for the fixed effects. Notably, the credible intervals for the fixed effects of veteran status and the interaction between time and veteran status include zero.

6.3 Robustness

In Bayesian analysis with JASP, the choice of Cauchy prior width for the fixed effects can impact the strength of evidence supporting the null hypothesis. Setting the prior width to a small value suggests a strong belief in the effect of fixed factors, resulting in less mass in the posterior distribution close to zero, thus weaker evidence for the null hypothesis. Conversely, a larger prior width implies weaker belief in the effect of fixed factors and more mass close to zero, resulting in stronger evidence supporting the null hypothesis.

The figure above displays the robustness of evidence across different Cauchy prior widths, indicating moderate to strong evidence in favor of the null model across all plausible scale parameters according to Kruschke (2011).

7 References

Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299-312. https://doi.org/10.1177/1745691611406925

Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.